Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
نویسندگان
چکیده
منابع مشابه
Data-Based Reinforcement Learning Algorithm with Experience Replay for Solving Constrained Nonzero-Sum Differential Games
In this paper a partially model-free reinforcement learning (RL) algorithm based on experience replay is developed for finding online the Nash equilibrium solution of the multi-player nonzero-sum (NZS) differential games. In order to avoid the performance degradation or even system instability, the amplitude limitation on the control inputs is considered in the design procedure. The proposed al...
متن کاملLinear Quadratic Zero-Sum Two-Person Differential Games
As in optimal control theory, linear quadratic (LQ) differential games (DG) can be solved, even in high dimension, via a Riccati equation. However, contrary to the control case, existence of the solution of the Riccati equation is not necessary for the existence of a closed-loop saddle point. One may “survive” a particular, non generic, type of conjugate point. An important application of LQDG’...
متن کاملNumerical Approximations for Nonzero-Sum Stochastic Differential Games
The Markov chain approximation method is a widely used, and efficient family of methods for the numerical solution a large part of stochastic control problems in continuous time for reflected-jump-diffusion-type models. It converges under broad conditions, and there are good algorithms for solving the numerical approximations if the dimension is not too high. It has been extended to zero-sum st...
متن کاملNonzero - Sum Stochastic Games
This paper extends the basic work that has been done on tero-sum stochastic games to those that are nonzerosum. Appropriately defined equilibrium points are shown to exist for both the case where the players seek to maximize the total value of their discounted period rewards and the case where they wish to maximize their average reward per period. For the latter case, conditions required on the...
متن کاملAdaptive Linear Quadratic Control Using Policy Iteration
In this paper we present stability and convergence results for Dynamic Programming-based reinforcement learning applied to Linear Quadratic Regulation (LQR). The spe-ciic algorithm we analyze is based on Q-learning and it is proven to converge to the optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. The performance of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Science China Information Sciences
سال: 2019
ISSN: 1674-733X,1869-1919
DOI: 10.1007/s11432-018-9602-1